NII/LLMC released CC Audio and Archive.org Audio Dataset. URL lists, metadata, and a downloader covering 48,000+ hours of Japanese audio. What it actually contains and how it fits into TTS, ASR, and audio model training.
A technical walkthrough of Alibaba's Qwen3-Omni-30B-A3B. An omni-modal model that activates only 3B out of 30B and responds with speech from text/image/audio/video inputs. The article organizes the Thinker–Talker architecture, benchmarks, and the overall Qwen3 MoE family.